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Abstract 

A fairly general procedure is studied to perturbate a multivariate density satisfying a weak 
form of multivariate symmetry, and to generate a whole set of non-symmetric densities. The 
approach is general enough to encompass a number of recent proposals in the literature, variously 
related to the skew normal distribution. The special case of skew elliptical densities is examined in 
detail, establishing connections with existing similar work. The final part of the paper specializes 
further to a form of multivariate skew t density. Likelihood inference for this distribution is 
examined, and it is illustrated with numerical examples. 
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1 Introduction 



1.1 Motivation and aims 

There is a growing interest in the literature on parametric families of multivariate distributions which 
represent a local departure from the multivariate normal family, in the sense that they exhibit a bell- 
shaped behaviour similar to the normal density, and they can be made arbitrarily close to the normal 
density by regulating a suitable parameter. The phrase 'local departure' must be interpreted appro- 
priately, in the sense that, while these families can approach normality, they also can, under other 
circumstances, exhibit quite a substantial departure from normality. 

The motivation of these efforts is to introduce more flexible parametric families capable of 
adapting as closely as possible to real data, in particular in the rather frequent case of phenomena 
whose empirical outcome behaves in a non-normal fashion but still retains some broad similarity with 
the multivariate normal distribution. Typically this departure from normality occurs in the form of a 
roughly bell-shaped density, but with contour levels not quite elliptically shaped and/or with contour 
levels not quite spaced as the normal density prescribes. 

Some of this literature is connected with the so-called multivariate skew normal (SN) distribu- 
tion, recently studied by Azzalini & Dalla Valle (1996) and Azzalini & Capitanio (1999); this has 
been further developed by other authors whose work will be referenced later in this section. The d- 
dimensional SN density, in the 'standard' form which does not include location and scale parameters, 
is 

2M¥,n)Ha T y), y G (1) 

where 4>d{y\ ^) is the Nd(0, Cl) density at y for some correlation matrix Cl, <£(•) is the N(0, 1) distri- 
bution function and a £ M. d . Here a plays the role of shape parameter; when a = 0, we recover the 
regular normal density. 

As a further level of generalisation of the normal distribution, Azzalini & Capitanio (1999, 
p. 599) have presented a lemma which leads to the construction of a 'skew elliptical' density, which 
is an elliptical density multiplied by a suitable skewing factor, in such a way that the product is still 
a proper density. Branco & Dey (2001) have considered another form of skew elliptical distribution, 
whose connections with the one mentioned above will be discussed extensively in this paper. Other 
work on extensions of elliptical families has been done by Genton & Loperfido (2002), where it is 
shown that distributional properties of certain functions of elliptical variates extends to their skewed 
variants, generalizing a similar result of Branco &Dey (2001). 

Arnold & Beaver (2000a) have studied a variant of (1) which replaces the argument of by «o+ 
a T y, where ao is an additional parameter, with consequent adjustment of the normalising constant. 
The same variant of the SN distribution has been considered by Capitanio et al. (2003) in the context 
of graphical models. Sahu, Dey & Branco (2001) have studied yet another form of skew elliptical 
distribution, where the skewing factor is a d-dimensional distribution function, rather than a scalar 
one like those of the previously mentioned cases. In the same spirit as (1), Arnold & Beaver (2000b) 
have studied a form of multivariate skew Cauchy distribution. For additional references and a recent 
review on the connected literature, see Arnold & Beaver (2002). 

There is therefore a set of interesting developments in various directions aimed at extending (1) 
or adapting the underlying idea to other distributions. While all this activity is definitely promising and 
appealing, it also brings in the question of the inter-relationships among these contributions, which 
tend to appear as scattered in different directions. 
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One purpose of the present contribution is to propose a fairly general extension of (1); in ad- 
dition, a better understanding of the connections and similarities among some of the above-described 
proposals is attempted. A broad formulation is presented in Section 2, and is specialised to a skew 
elliptical form in Section 3. This approach encompasses several of the existing proposals and it ap- 
pears to provide a potentially general framework for special cases. We discuss in some detail a few of 
these and, from Section 4 onwards, we focus on a form of multivariate skew t distribution; since this 
represents a mathematically quite manageable distribution, allowing ample flexibility in skewness and 
kurtosis, it appears to be a promising tool for a wide range of practical problems. Associated likeli- 
hood inference for this skew t distribution and illustrative examples are presented in Section 5. Some 
background information on the SN distribution and the elliptical family is given in the second part of 
this introductory section. 



1.2 Some preliminaries 

The SN distribution Given a full-rank d x d covariance matrix = (u> rs ), define 

u = diag(wi, ...,u d )= diag(wii, . . .,uj dd ) 1/2 

and let = o; _1 ilaj _1 be the associated correlation matrix; also let £, a € A <i-dimensional 
random variable Z is said to have a skew normal distribution if it is continuous with density function 
at z G R d of type 

2^(z-£;ft)$(a T ^- 1 (z-£))- (2) 

We shall then write Z ~ SNd(£, fi, a), referring to £, Q, a as the location, dispersion and shape or 
skewness parameters, respectively. Density (1) corresponds to the 'standard' distribution SN^(0, Cl, a). 

By varying a, one obtains a variety of shapes; Azzalini & Dalla Valle (1996) display graphically 
some instances of them when d = 2. Clearly, when a = 0, we are back to the N d (£, fi) density. The 
cumulant generating function is 

Kz{t) = t T e + \t T m + (o(5 T cut) 

where 

6 = 7 -!- ri72 "«. Co(x) = log{2 $(*)}. (3) 

From the expression for 5 we have 

a = \ fi-M. (4) 

There exists at least two stochastic representations for Z. These are useful for random number 
generation and for deriving in a simple way a number of formal properties. 

o Conditioning method. Suppose that Uq is a scalar random variable and U is a <i-dimensional 
variable, such that 

? " S ■ N d+1 (0,n*), n*=(] (5) 
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where Q* is a full-rank correlation matrix. Then the distribution of (U\Uo > 0) is SN^(0, Q, a) 
where a is a function of 5 and 0; in fact, we can also set 

[U if ?7 > 0, 
\-U ifC/ <0. 

By an affine transformation of the resulting variable one obtains a distribution of type (2). 
o Transformation method. Suppose now that 

where ^ is a full-rank correlation matrix, and define 

Z j = 5 j \U( ) \ + (l-5]) 1/2 U'j, (7) 

where — 1 < 5j < 1 for j = 1, . . . , d. Then (Z±, . . . , Z$) has the <i-dimensional skew normal 
distribution, with parameters which are suitable functions of the S's and 

A third type of representation is known to exist in the scalar case. If (Uq, U\) is a bivariate 
normal variate with standardized marginals and correlation p, then 

max(f7 ,f7i) ~ SN(0,l,a) (8) 

1 /2 

where a = ((1 — p)/(l + p)) 1 . This result has been given by Roberts (1966), in an early explicit 
occurrence of the scalar SN distribution, and later rediscovered by Loperfido (2002); the same con- 
clusion can also be obtained as special case of a result of H. N. Nagaraja, quoted by David (1981, 
Exercise 5.6.4). The generalization of this type of representation to the multivariate setting to obtain 
(1) via a set of max(-) operation on normal variates is an open question. 

Among the many formal properties shared with the normal class, a noteworthy fact is that 

(z-o T n-\z-o~xi (9) 

Other properties of quadratic forms of SN variables are given by Azzalini & Capitanio (1999), Genton 
et al. (2001) and Loperfido (2001). Another important property of this class is closure under affine 
transformations of the variable Z; in particular, this implies closure under marginalization, i.e. the 
distribution of all sub-vectors of Z is still of type (2). 

What is lacking is closure under conditioning, i.e. the conditional distribution of a set of com- 
ponents of Z given another set of components is not of type (2). This property is achieved by a simple 
extension of (2) which has been examined by Arnold & Beaver (2000a) and by Capitanio et al. (2003). 
This variant of the density takes the form 

^(r)- 1 4> d (z - £; Q) $(a + a T u>-\z - £)) (10) 

where r (r 6 R) is an additional parameter and 

ao=(l- 5 T Cl- 1 5y 1/2 t. 

When r = 0, «o = and (10) reduces to (2). Unfortunately, the x 2 property (9) does not hold for (10), 
if r / 0. A form of genesis of (10) via conditioning using (6) is by consideration of (U\Uq + r > 0) . 
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Elliptical distributions We summarize briefly a few concepts about and establish notation for el- 
liptical distributions, confining ourselves to random variables without discrete components. For a full 
treatment of this topic, we refer the reader to Fang, Kotz and Ng (1990). 

A d-dimensional continuous random variable Y is said to have an elliptical density if this is of 
the form 

f(y; n) = pj^ fi(v - T ^Hy - 0}, v e M d , 

where £ G R d , Q is a covariance matrix, / is a suitable function from R + to R + , called the 'density 
generator', and c d is a normalising constant. We shall then write Y ~ Ell^(^, Q, f). 

The basic case is obtained by setting f(x) = exp(— x/2) and a = (2ir)~ d / 2 , leading to the 
multivariate normal density. Two other important special cases, which will be used extensively in the 
sequel, are provided by the multivariate Pearson type VII distributions, whose generator and normal- 
ising constant are 

u _ r(M) 



f(x) = (l + x/v)- M , c d 



(irv) d / 2 T(M - d/2) ' 
where v > 0, M > d/2, and by the multivariate Pearson type II distributions for which 

r(d/2 + i/+l) 



f(x) = (l-xY, c d 



■K d / 2 T(v + 1) 



where < x < l,v > — 1. The special importance of type VII lies in the fact that it includes 
the multivariate t density when M = {d + v)/2, hence also the Cauchy distribution. For these 
distributions, we shall use the notation PVII^(^, Q,, M, v) and PII^(^, $7, v), respectively. 

A convenient stochastic representation for Y is 

Y = £ + RL T S (11) 

where L T L = Q, the random vector S is uniformly distributed on the unit sphere in W 1 and R 
is a positive scalar random variable independent of S, called the generating variate. An immediate 

consequence of this representation is that (Y — ^) T $7 _1 (y — £) = R 2 , where = means equality in 
distribution. 

Elliptical distributions are closed under affine transformations and conditioning. In particular 
they are closed under marginalization, in the following sense: consider the block partition Y T = 
(Yi, Y" 2 T ) where Y\ G R h and a corresponding partition for £ and U; then 

yi~Eii h (£i,nii,/i) 

Similarly, for the conditional density we have 

(Yi\y 2 = V2 ) ~ eiu(6 + n 12 n^(y 2 - £ 2 ),n n - n 12 n£n 21 ,py). 

where Q y = yJ^l^V'i- ^ ne density generators /i and f® y are not necessarily of the same form as /. 
Kano (1994) has shown that the form of the density generator is preserved under marginalization only 
in the case of elliptical distributions which can be obtained from a scale mixture of normal variates. 
This property is true, for instance, for multivariate Pearson type VII and II distributions. The generator 
fQ y of the conditional distribution depends in general on the quantity Q y , with the exception of the 
normal distribution. 
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2 Central symmetry and distributions obtained by its perturbation 



Our starting point is the following proposition which is closely connected to Lemma 1 of Azzalini 
& Capitanio (1999). Strictly speaking, the present statement is a bit more restricted than the earlier 
result, but it has the major advantage of requiring a set of conditions whose fulfillment is far simpler 
to check, and still it represents a very general formulation. 

The result refers to central symmetry, a simple and wide concept of symmetry, which is com- 
monly in use in nonparametric statistics; see Zuo & Serfling (2000). Other authors refer to the same 
property with alternative terms. A d-dimensional random variable Y is said to be centrally symmetric 

around a point £ if Y — £ = £ — Y. Since we shall deal with continuous variables, the above require- 
ment implies that the corresponding density function / satisfies f(y — £) = /(<!; — y) for all y 6 R d , 
up to a negligible set. It is immediate to see that the condition of central symmetry is satisfied by 
various ample families, notably the elliptical densities, but also many others; some examples are the 
symmetric stable laws, the Watson rotational symmetric densities, the class of distributions studied 
studied by Szablowski (1998), among many others. 



Proposition 1 Denote by f(y) the density function of a d-dimensional continuous random variable 
which is centrally symmetric around 0, and by G a scalar distribution function such that G(—x) = 
1 — G(x)for all real x. Ifw(y) is a function from M. d to R such that w(—y) = —w(y)for all y 6 R rf , 
then 

2f(y)G{w(y)} (12) 

is a density function. 



Proof. Denote by Y a random variable with density /, and by X a random variable with distribution 
function G, independent of Y. To show that W = w(Y) has a distribution symmetric about 0, denote 
by A a Borel set of the real line and by —A its mirror set obtained by reversing the sign of each 
element of A. Then, taking into account that Y and — Y have the same distribution, 

P{W e -A} = P{-W €A}= F{w(-Y) eA} = F{w{Y) G A} , 

showing that W has the property indicated. Then, on noticing that X — W has distribution symmetric 
about 0, write 

\ = P{X < W} = E Y {F{X < w(Y)\Y = y}}= [ G{w(y)} f(y)dy 

which completes the proof. 

To demonstrate graphically the ample flexibility attained by (12) for appropriate choices of /, 
G, and w, we present the following example in the case d = 2. Consider the non-elliptical distribution 

obtained by multiplication of two symmetric Beta densities rescaled to the interval (—1,1), with 
positive parameters a and b. We perturb this density by choosing 

n( x e x sm(pi yi + p 2 y 2 ) 



l + e x ' 1 + cosfeyi + q 2 y2) 
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where pi,P2, qi and qi are additional parameters. 

We have generated several plots of the above type of density, obtaining an extremely rich set of 
surfaces, as indicated by the small collection of such densities given in Figure 1. Additional regulation 
of the shape could be achieved, by inserting parameters in the logistic function G(x), although it is 
doubtful that one would need the latter level of additional flexibility. The plots indicate that the effect 
of perturbing / via (12) is far more complex than the effect introduced, say, by the skewing factor of 
the normal density in (2). Clearly, the purpose of Figure 1 is purely illustrative, and it is not suggested 
to use the above class of density functions in practice without further investigation. 

For a random variable with density (12), the stochastic representation given by Azzalini & 
Capitanio (1999, p. 599) for a slightly different case is still valid. In fact, the conditions required there 
for its validity are actually those of Proposition 1 . Specifically, if Y has density function / and X is 
an independent variable with distribution function G, then 

Y i£X<w(Y) 

-Y i£X>w(Y) U ' 

has density function (12). Clearly, this provides an algorithm for generating Z and it will also turn out 
to be useful for theoretical purposes. 

It can be shown that the conditioning method for generating skew normal random variables 
from (5) is a special case of (13). In fact, from consideration of the residual part of Uq after removing 
the regression on U, define the variable 

X = - (l-8 T n- 1 S^ (U - 5 T n' 1 U) ~ iV(0,l), (14) 

independent of U. After substituting symbols, the condition X < a T U of the top branch of (13) is 
equivalent to Uq > 0, if a is given by (4); hence it generates a SN^(0, Cl, a) variable if we set Z = U. 
The condition of the lower branch is equivalent to —X < a J (-U) leading to a SNd(0, Q, — a) 
variable if we set Z = U, hence to a SN^(0, Q, a) variable if we set Z = —U. 

Similarly, the stochastic representation of a variate with density (10) via (U\Uq + r > 0) could 
be reformulated in terms of the condition X < ao + a T U. In general, the existence of a similar 
correspondence would be unclear if the assumption of normality in (5) was replaced by some other 
distributional assumption. Luckily, a suitable transformation analogous to (14) can be obtained in a 
few important special cases to be discussed in Section 3. 

It is immediate that, if / is an elliptical density, G corresponds to a distribution symmetric about 
and w(y) = a T y for some a 6 M. d , then the conditions required by Proposition 1 are fulfilled. We 
then obtain the family of densities produced by Corollary 2 of Azzalini & Capitanio (1999). 

Proposition 2 Denote by Y and Z two d-dimensional random variates having density function f and 
(12), respectively, satisfying the conditions of Proposition 1. If t(-) is a function from M. d to some 
Euclidean space, such that t(—y) = t(y)for all y G M d , then 

t(Y) = t(Z) . 

Proof. This is immediate from representation (13). 

A key example of the above result is obtained when t(y) represents the distance from the origin. 
Since any choice of t(-) must satisfy the symmetry condition t(y) = t(—y), then the probability 



7 



(a,fc>,p,q) = ( 2 , 3,(3,3), ( O , O )) 



(a,fc>,p,q) = ( 2 , 3,(8,8), ( O , O )) 




distribution of the distance of a random point from the origin is the same for Y and for Z. In particular 

we can write Y T BY = Z T BZ for any positive definite matrix B. A result similar to Proposition 2 
for the case when / is an elliptical distribution has been given by Genton & Loperfido (2002). 

A related set of applications of Proposition 2 is offered by various results on quadratic forms of 
skew normal variates, all of which lead to the conclusion that known distributional results for normal 
variates still hold if the variates are of skew normal type. This set of results includes Proposition 
7, 8 and 9 of Azzalini & Capitanio (1999) and Proposition 1, 2 and 6 (parts 1 and 3) of Loperfido 
(2001). For these conclusions, one must consider functions t(-) in Proposition 2 taking on values in 
an appropriate Euclidean space, for instance M. + x R + if the independence of two quadratic forms 
is under consideration. Notice that Propositions 8 and 9 of Azzalini & Capitanio (1999) have added 
conditions on the a parameter, but these are not necessary. There is no conflict with the present 
conclusions since in their Proposition 8 this extra condition is part of a sufficiency requirement, and 
their Proposition 9 (a Fisher-Cochran type of theorem) was stated in a more restricted form than 
actually possible. 

We conclude this section with a discussion on possible generalisations of Proposition 1 . A very 
general form of density resembling (12) is along the following lines. Denote by X = (Xi, . . . , X m ) T 
an m-dimensional random variable with distribution function G, by Y an independent <i-dimensional 
random variable with density function /, and by wi(y), . . . , w m (y) a set of functions from R d to M. 
For the moment, we remove any assumptions on /, G and the w^s; there is no loss of generality 
in assuming Wi(0) = 0, since otherwise Wi(0) could be absorbed into the 6j's to be introduced in a 
moment. Then 

p' 1 G{ Wl (y) + 6i, . . . , w m (y) + b m } f(y) (15) 
is a density function for any choice of the real numbers bi, . . . , b m , if 

p = ¥{X 1 -w 1 {Y) < h,...,X m -w m (Y) < b m }. 

The statement follows immediately from the fact that 

p = E Y {F{X 1 -wi(y)<b 1 ,...,X m -w m (y)<b m \Y = y}} 
= / G{w\{y) + b 1 ,...,w m (y) + b m } f(y)dy. 

Clearly, the difficulty is in computing the normalising constant p. This task is amenable when 
X and Y are multivariate normal variables. A rather simple special case of (15) is given by (10) 
where G is the scalar normal distribution function, and / is cf)d( x ] An instance of density (15) 
with multivariate G is given by Sahu et al. (2001); in their case, / is the <i-dimensional normal 
density, G is the d-dimensional normal distribution function, the Wj 's are d linear combinations of y 
and all b/s are 0. The multivariate distribution sketched by Azzalini (1985, section 4) and the multiple 
constraint model outlined by Arnold and Beaver (2000a, section 6) has a G which is the product of m 
(m > 1) terms of type ^(aiyi) or &(ajy + bi), respectively. The 'general multivariate skew normal 
distribution' mentioned by Gupta, Gonzales-Farfas and Dommguez-Molina (2001, section 5) is even 
more general since they adopt a G which is the the m-dimensional normal distribution function. 

When / or G or both, in (15), are not of Gaussian type, evaluation of p is generally much more 
problematic. Some form of restrictions must however be imposed, not only to make the problem 
tractable but also because it has little meaning to consider (15) in its full generality which is so broad 
as to lose nearly any structure. A reasonable setting is as follows: suppose that / and G are both cen- 
trally symmetric and Wi(—y) = —Wi(y) for all y € R d . Then, by using essentially the same argument 
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as in the proof of Proposition 1, one concludes that W = {W\, . . . , W m ) = (w\(Y), . . . , w m (Y)) 
is centrally symmetric; therefore so is V = {X\ — W±, . . . , X rn — W m ), by using the properties of 
centrally symmetric functions. A tractable instance of this setting is offered by the skew Cauchy dis- 
tribution and its variants discussed by Arnold and Beaver (2000b), using a univariate G. Exploration 
of other cases along the direction sketched above seems very interesting but far beyond the scope of 
the present paper. 

3 Skew elliptical densities 

This section focuses on an important subclass of (12) with the component / of elliptical form, aiming 
at three main goals. The first is to prove that the two forms of skew elliptical densities introduced by 
Azzalini & Capitanio (1999, p. 599) and by Branco & Dey (2001) are closely connected. The second 
goal is to show that the relationships among the three forms of stochastic representation of a skew 
normal variate recalled in Section 1 .2 carry over to skew elliptical variates. Furthermore, an analogue 
of stochastic representation (11) for elliptical variates is obtained for skew elliptical ones. 

3.1 Skew elliptical densities by conditioning 

For simplicity of presentation, we shall work with correlation matrices, and location parameter 0. For 
the rest of this section, U* denotes a (d + 1) -dimensional variate partitioned into a scalar component 
Uq and a d-dimensional vector U. 

Branco & Dey (2001) have introduced a class of skew elliptical distributions generated by apply- 
ing to a (d + 1) -dimensional elliptical variate the same conditioning method described in Section 1.2 
in connection with the SN distribution. The following proposition recalls their key statement, up to 
some inessential changes of notation. 

Proposition 3 Consider the random vector U* ~ EfLi +1 (0, fl*, /) where fi* is defined in (5). Then 
the probability density function of Z = {U\Uq > 0) is 

2fu(z;Q) ^ Z cif Q *(y 2 )dy (16) 

J — oo 

where 

Q z = z T n- 1 z, (17) 

the vector a is defined in (4), fu is the density ofU, f® z {-) is the density generator of {Uq\U = z) 
and c\ is the associated normalizing constant. 

For later use, note that an alternative expression for (16) is 

2 J™ c d+1 f(u* T (n*)- l u^ \n*\~ 1/2 du . (18) 

On defining F® z (•) to be the distribution function corresponding to the density generator f® z (•), 
the above result lead Branco & Dey (2001) to re-write (16) in the form 

2f u (z;n)F^(a T z) (19) 
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where the distribution function F® z is actually varying at each selected point z. This expression 
appears to be different from (12) where a fixed distribution function F is involved. 

However, when the quantity Q z can be removed from the argument of the integral in (16) by 
means of a suitable change in variable, the resulting density function will become 

2 fu(z; fi) F{w(z)} (20) 

where F is a univariate distribution function and w is such that w(z) = h(a T z, z 1 U _1 z) for some 
function h from R x R + to R. It is easy to show that the property w(— z) = —w(z) must hold; hence 
(20) is of type (12). 

It is difficult to state general conditions under which a density of type (19) can actually be 
transformed into one of form (20), but special cases where this is indeed feasible do exist. We shall 
now examine in detail two important cases of this form, namely when U* has either a PVIId+i or a 
Plld+i distribution, which are among those considered by Branco & Dey (2001). 

Proposition 4 If the random vector U* has a PVII<i+i(0, Q*, M, v) distribution, then the probability 
density function of Z = {U\Uq > 0) is 

2f u (z;Q)F 1 (a T z(u + Q z )- l/2 ;M,l), z G (21) 



where Q z is given by (17), fu is the density of a PVII^(0, J7, M — 1/2, u) and Fi(-;M, 1) is the 
cumulative probability function of a PVIIi(0, 1, M, 1). 

Proof. Using results in Fang, Kotz and Ng (1990, pp. 82-83), we have 
and 

f ( o\ - r(M-i/2) / qa ~ M+1 ' 2 

Ju ^ Z ' U) - \n\^(TTu)^F(M-(d+l)/2) V v ) 

i.e. the densities of a PVIIi (0, 1, M, v+Q z ) and of a PVII rf (0, M- 1/2, v) variate with parameters 
M — 1/2 and v, respectively. On setting x = y (v + Q z )^ 1 ^ 2 , the integral in (16) becomes 



a'z(u+Q z )-^ ,,,,, 

(l + x 2 )- M dx 



r(M) 



7r i/2r(M - 1/2) 

which is the distribution function of aPVIIi(0, 1, M, 1) variate evaluated at the point a T z [y + Qz)^ 1 ^ 2 . 
QED 

Example 1: skew t distribution. The relevance of the PVII^ class is due to the inclusion of the 
multivariate t family as the special case when M = (d + v)/2. The corresponding specification of 
Proposition 4 produces then a form of multivariate skew t density. Since Section 4 will be entirely 
dedicated to this distribution, we defer detailed discussion until then. 

Proposition 5 If the (d + 1) -dimensional elliptical random vector U* has a PIIrf + i(0, £1*, v) distri- 
bution, then the probability density function of Z = (U\Uo > 0) is 

2f u (z;Cl)F 1 (a T z (1 - Q,)' 1 / 2 ;u) , ze (-l,l) d , (22) 
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where Q z is given by (17), fy is the density of a PIId(0,Sl,^ + 1/2) variate, and F\(-;v) is the 
distribution function of a PIIi(0, 1, v). 

Proof. Identical to that of Proposition 4, considering the densities of marginal and conditional distri- 
butions of PII, as defined in Fang, Kotz and Ng, (1990, pp. 89-91). 

The absence of Q z in the conditional density characterizes the multivariate normal distribution 
among the members of the elliptical family. This fact can be used to produce an analogous character- 
ization of the skew normal distribution within the skew elliptical family. 

Proposition 6 The function w in (20) is such that w(z) = a T z if and only ifU* is Gaussian, i.e. Z 
is skew normal. 

Proof. The density of (U\Uo = z) does not depend on Q z if and only if U* is Gaussian; see 
Theorem 4.12 of Fang et al. (1990). In this case, the integral in (16) becomes &(a T z), so that 
Z ~ SN d (0,fi,a). QED 

A number of parallels between the skew normal distribution and other types of skew elliptical 
distributions have already been shown. The next result allows us to construct a random variable X 
playing a role analogous to the one in (14) for the skew version of a PVII^ and PII^ distribution, 
respectively. 

Proposition 7 Let U* ~ PVII d+1 (0, Q*,M, v). Then 

x = -(i- sn-^y 112 (u - 5 T n~ 1 u) (u + ^n- 1 ^ ~ pviii(o, i, m, i), 

independent ofU. IfU*~ PII (J+1 (0, 0*, v) then 

I Icy 

x = -(i- sn-^y 112 (u - 5 T n- 1 u) (i - c/Tfr 1 ?/) ~ pii^o, i, v), 

independent of U. 

Proof. By direct calculation. 
Therefore, we can set 

Z= (U if X<w(U), 
\-U if X>w(U), 

where w(z) is the transformation of z used in the argument of F\ in (21) and (22), respectively; it is in- 
tended that the appropriate distribution of U* and transformation X have been selected. This formula 
establishes a method of type (13) to generate a skew PVII^ and skew PII^ variate, respectively. 

The connections between the proposal of Azzalini & Capitanio (1999) and the one of Branco & 
Dey (2001) can be summarised as follows. The conditioning argument which is one of the mechanisms 
to generate the skew normal distribution from the normal one can be adopted to generate a form of 
skew elliptical distributions from the elliptical ones, leading to (19), or some similar form as obtained 
by Branco & Dey. This type of expression can, at least in some important special cases, be transformed 
into one where the skewing factor of / is a fixed distribution function, as shown by (21) and (22). 
These expressions are of type (12), which is essentially the form of Azzalini & Capitanio. The natural 
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question is whether all densities of type (19) can be re-written in the form (12), but we have been 
unable to prove this fact in general. Notice that the converse inclusion is not true, that is, not all 
densities of type (12) can be written in the form (19), unless additional restrictions are imposed on the 
components of (12), besides the obvious condition that / is elliptical. 

The next result concerns a stochastic representation of type (11) for distributions of type (12) 
when the density / is elliptical. For example, this representation is valid for the skew elliptical den- 
sities defined in Azzalini & Capitanio (1999, p. 599) and for the skew versions of PVII^ and PII^ 
examined earlier. 



Propositions If Z has a density of type (12), where f is the density ofU~ ElLj(£, fi,/), then Z 
admits the stochastic representation 

Z = i + RL T S' (23) 

where Cl = L T L, R > has the same distribution as the radius of the stochastic representation (11) 
of U, and S' has a non-uniform distribution on the unit sphere of R d . Specifically, using spherical 
coordinates, the density of S' is equal to 

71 k=l 

where w* L (-) is a function from M. d to R defined in Appendix A, and X is an independent random 
variable having distribution function G. Furthermore, the conditional distribution of S' given R = r 
is of type (12), with density 

71 k=i 



Proof. In Appendix A. 

Example 2: Stochastic representation (23) for skew normal distribution. If Z ~ SNd(£, Cl, a), then 
by applying Proposition 8 we obtain R? ~ an ^ tne following spherical coordinates representation 
of the marginal distribution of S': 

P{X < R (a\ cos 9\ + a\ sm#i cos 62 + ■ ■ ■ + a* d sin 9\ ■ ■ ■ sin 6^-1)} , 

where 6 = (9±, 82, ■ ■ ■ , 6d-i) T , a* = La and X ~ N(0, 1) is independent of R. Finally, noticing 
that d 1//2 X R~ x has a t distribution with d degrees of freedom, we have 

mo) = s^nW)"- 1 

Ti{d l l 2 (aX cos^i + a\ sin^i cos 62 + . . . + a* d sva.9\ ■ ■ ■ sin^^x); d} 
where T\(-;d) is the distribution function of a scalar t distribution with d degrees of freedom. 
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3.2 Skew elliptical densities by transformation method 

The next result shows how the class of skew elliptical distributions mirrors another property of the 
skew normal distribution. In fact the class of skew elliptical densities obtained via the conditioning 
method is equivalent to the one obtained by applying the transformation method recalled in Section 
1.2. 

Proposition 9 Consider the random vector (Uq, U) ~ ElLj +1 (0, /) where ^* is as in (6), and 
define 

Zj = 5 j \U \ + (l- 5]) 1/2 Uj, j = l,...,d, (24) 
where —1 < Sj < 1. Then the density of(Z±, . . . , Z<j) is of type (16), where 

X t = Siil-dfY 1 ' 2 , (i = l,...,d), 
A = diag{(l + A?)- 1/2 ,...,(l + A^)- 1/2 }, 
fl = A(* + AA T )A, 

a = (l + A T ^A) _1/2 A- 1 ^- 1 A. 

Proof. First note that the joint density function of \Uq\ and U takes the form 2cd+\f{). Denote by B 
the (d+ 1) x (d+ 1) matrix implicitly defined by (24) such that (Z Q , Z 1: . . . , Z d ) T = B(\U \, U T ) T , 
and apply the usual formulae for linear transforms. Then the density function of (Z\, . . . , Zj) turns 
out to be 

2 c d+ J ((^o, z T )A-\z , z T ) T ) |A|-V2 dxo 

where A = B^*B T is a correlation matrix. Taking into account expression (18) the result follows. 
QED 

An immediate consequence of the transformation method is a further generating method for the 
bivariate case. Again, this reproduces for the skew elliptical family a generation method known to 
hold for the skew normal distributions. 

Proposition 10 If (Uq, U) ~ Ell2(0, $7*, /), the class generated by Z = max(C/o, U) is equal to the 
class generated by the transformation method of Proposition 9 with d = 2. 

Proof. First notice that max(C/o,^) = \\U — Uq\ + \{U + Uq). As the joint distribution of 
(U-U ) (2 - 2p)' 1/2 and (U+U ) (2 + 2p)' 1/2 is E11 2 (0, /, /), where p denotes the off-diagonal el- 
ements of Vt* , the result follows by direct application of Proposition 9 on imposing 5 = (^(1 — p)) 1 ^ 2 . 
QED 

4 A skew t distribution 

For the rest of the paper we shall focus on the development of an asymmetric version of the multi- 
variate Student's t distribution, already sketched in Section 3.1. The purpose of the present section is 
to provide additional support for its definition and to examine more closely its properties. Connected 
inferential aspects will be discussed in the subsequent section. 
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4.1 Definition and density 



The usual construction of the t distribution is via the ratio of a normal variate and an appropriate 
transformation of a chi-square. If one wants to introduce an asymmetric variant of the t distribution, 
a quite natural option is to replace the normal variate above by a skew normal one. 

A preliminary result on Gamma variates is required. We shall say that a positive random variable 
is distributed as Gamma(^, A) if its density at x (x > 0) is 

■ x^~ l exp(— A x). 



r(V0 

Lemma 11 IfV ~ Gamma(^, A), then for any a, b € R 

E^(aW + 6)} = P<jV < ay^p/x} 

where T denotes a non-central t variate with 2-ip degrees of freedom and non-centrality parameter 
-b. 

Proof Let U ~ N(0, 1); then 

E^(aW + &)} = E V {P{U <a^ + b\V = v}} 

= E y |p|([/-6)/(«A/V) 1/2 < a(V/A) 1/2 |y = u}} 
= P|T' < a(^/A) 1/2 } 

where T = (U — b)/ (V\/ip) 1/2 has the quoted t distribution. QED 

As anticipated earlier, we define the skew t distribution as the one corresponding to the trans- 
formation 

Y = £ + V~ 1/2 Z (25) 

where Z has density function (2) with £ = 0, and V ~ X 2 / 17 ' independent of Z. An equivalent 
interpretation of Y is to regard it as a scale mixture of SN variates, with mixing scale factor V~ 1 / 2 . 
Application of the above lemma to a Gamma(^, \ v) variate and some simple algebra lead to the 
density of Y, which is 

f Y (y) = 2 t d (y; v) T x (a T u-\y - £) ( ' ;» + d) (26) 



Qy + V 



where to is defined at the beginning of Section 1 .2, 

Q, = (y-i) T n- 1 (.y-(). 

is the density function of a d-dimensional t variate with v degrees of freedom, and T\ (x; u+d) denotes 
the scalar t distribution function with u + d degrees of freedom. We shall call distribution (26) skew 
t, and write 

y~St d (£,fi,a,i/). (27) 
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It is easy to check that density (26) coincides with the one sketched in Section 3.1 using Propo- 
sition 4, which is of type (12). Moreover, for the reasons explained in that section, (26) coincides in 
turn with the skew t distribution of Branco & Dey (2001), although this equality is not visible from 
their derivation because they did not provide the above closed-form expression of the density. 

Therefore, we have seen that a number of different ways to define a skew t distribution all 
lead to the same density (26). While additional proposals to introduce a form of a skew t density 
are possible, this one has the advantage of arising from various generating criteria, which in turn are 
linked to other portions of literature. 

A reviewer of this paper has remarked that, if we set d = 1, density (26) does not reduce to 
the form 2ti(y; v) T\{ay\ v), which seems to be the 'most natural' univariate form of skew t density 
generated by Lemma 1 of Azzalini (1985), a forerunner of Proposition 1. While the latter density 
has the appeal of a slightly simpler mathematical expression, the arguments indicated in the previous 
paragraph lead us to prefer (26). In fact, one could reverse the reasoning, and claim that Lemma 1 of 
Azzalini (1985) 'should' had been stated in the form of Proposition 1 for d = 1; in other words, there 
is no reason to restrict w(y) to the linear form ay, especially outside the normal case. 

Alternative proposals of univariate skew t distributions have been made by Fernandez & Steel 
(1998), constructed similarly to the so-called two-piece normal density, and by Jones (2001), devel- 
oped by Jones & Faddy (2002), which is based on a suitable transformation of a beta density. A 
multivariate form of skew t distribution has been proposed by Jones (2002) but the associated in- 
ferential aspects have not been discussed. The alternative form of multivariate skew t distribution 
considered by Sahu et al. (2001) concides with (26) in the case d = 1; for general d, their density 
involves the multivariate t distribution function. The density examined in this paper allows a rela- 
tively simple mathematical treatment, and it is more naturally linked to the skew normal distribution, 
via mechanisms already mentioned. As a consequence, the distribution enjoys various useful formal 
properties, which will be examined in the remaining part of this section. 

4.2 Some properties 

Distribution function For simplicity of exposition, we obtain the distribution function of Y in the 
'standard' case with £ = 0, fi = fi. Bearing in mind the representation of Z based on conditioning, 
write 

IP{^ < y} = P<jy~ 1/2 Z < y j 

= ¥{V~ 1/2 U< y\ U > O} 

_„{v*» (>),(;)} 

= 2 p{r £ (»)} 

where (Uq, U) has distribution (5), and the inequality signs are intended componentwise. The last 
expression involves the integral of a multivariate (d + 1) -dimensional t variate T' with dispersion 
matrix similar to the one of (5), but with reversed sign of S. Algorithms for computing this type of 
distribution function are given by Genz & Bretz (1999). 
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An alternative expression for the above distribution function is given by 

F{Y <y} = P{v- 1/2 U < y | U > o} = E v ^F z (yv 1/2 )\V = t>} , 

where Fz denotes the distribution of Z, hence evaluating the distribution function of Y by suitably 
averaging the distribution of Z with respect to the distribution of V. This expression is most useful 
in the case d = 1 where a practical expression of Fz is available; see formula (4) and subsequent 
remarks of Azzalini (1985). 



Moments Using the representation (25), it is easy to compute the moments of Y. For algebraic 
convenience, we assume £ = throughout. If E{y (m) } denotes a moment of order m, write 

E |y(m)j =E |y-m/2j E j Z (m)J 

where Z has density function (2) with £ = 0. It is well-known that 



(28) 



E 



_ m/2l _ (,/2r/ 2 r(i(,- m )) 



|y-m/2j 



r(^) 



while, for the expressions of E j, we use results given by Azzalini & Capitanio (1999) and by 

Genton et al. (2001). 



First, we apply (28) to the scalar case. On defining 

fj, = S 



, , ,1/2 r(^-i)) 

V{\v) 



(29) 



one obtains, for £ = 0, 



E{Y} 

E{y 2 } 
E{y 3 } 

E{y 4 } 



LO 



i/-2' 



uj 3 fi(3-5 2 ) 



i/- 3' 



3 v 



{v-2){v-±y 



provided that v is larger than the corresponding order of the moment; the first two of the above 
expressions have been given by Branco & Dey (2001). After some algebra, the indices of skewness 
and kurtosis turn out to be 



7i 



72 



i/(3 - 5 2 
u-3 

(y- 2 ){v-A) 



v — 2 



v-2 



-3/2 



(if v > 3), 



4^(3 -8 2 ) | 6^ 4 



i/-3 



z/-2 



- A* 



- 3 



(if i/ > 4). 



In the multivariate case, we obtain from (28) that E{Y} = oofi still holds, provided v > 1 and 
(29) and u; are intended in vector and matrix form, respectively; furthermore 



E 



(if v > 2) , 
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leading to 



var{y} = 



v-2 



£1 — LOflfJ, <jJ . 



Linear and quadratic forms Consider the affine transformation a + AY where a £ W 11 and A is a 
m x d constant matrix of rank m. Using (25) we can write 

a + AY = Z' + V~ 1/2 AZ 

where £' = a + A£. Take into account that 

AZ ~ SN m (0,AnA T ,a') 

on the ground of results given by Azzalini & Capitanio (1999) where the explicit expression for a' is 
given; similar results, but in a more convenient form, are provided by Capitanio et al. (2003, Appendix 
A.2). Therefore we obtain 

a + AY~ St m (£', AQA T , a', v). 
In particular for a single component, Y r say (r G {1, . . . , d}), one has 



where a' r is given by (10) of Capitanio et al. (2003). 

Similarly, for a quadratic form, Q = (Y — £) T B(Y — £), where B is a symmetric d x d matrix, 
we can write 



For appropriate choices of B, the distribution of Z J BZ is xt' f° r some value v' of the degrees of 
freedom. One such case is (9), where B = f^ 1 . Azzalini & Capitanio (1999, Section 3.3) consider 
more general forms of B; see also Genton et al. (2001) for additional results. In all cases when the 
X 2 property holds for Z, we can state immediately 



This property allows us to produce Healy's-type plots (Healy, 1968) as a diagnostic tool in data 
fitting, similarly to the Normal and SN case, just using the Snedecor distribution as the reference 
distribution instead of the \ 2 - This device will be illustrated in the subsequent numerical work. 

An extended skew t distribution If the component Z in (25) is taken to have distribution (10) rather 
than (2), we obtain a density which parallels the role of (10) for skew t densities; this is now discussed 
briefly. 

By using again Lemma 1 1 , the new density turns out to be of type (26), except that T\ refers now 
to a t distribution with non-centrality parameter — r(l — 5 T Q~ 1 5)~ 1 / 2 and the normalizing constant 
2 is replaced by l/<3?(r). The distribution function is obtained with the same sort of argument of the 
case r = 0, namely 



Y r ~ St(£ r , uj rr ,a' r ,v) 



Q = Z ] BZ/V. 



Q/v* ~F(v',v). 
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where now T" refers to a non-central multivariate t; unfortunately, the latter distribution function is 
appreciably harder to compute in practice than the analogous one for the central case. Moments can 
be computed again with the aid of (28). Those of the first and second order are, if £ = 0, 

E{Y} = EjV^ 1 / 2 j Ci(r )lo5, (u>1), 
E { yyT } = ^ (^ + [C 2 (r) + Ci 2 (r)]^5M) T ), (u>2), 

where 

Cr(x) = d^ C ° (x) ' (r = l,2,...), 

and Co is defined by (3). 

5 Statistical aspects of the skew t distribution 
5.1 Likelihood inference 

Consider n independent observations satisfying a regression model of type 

Vi ~ St d (&,fi,a,i/), Ci = P T Xi 
for i = 1, . . . , n; here Xi is a p— dimensional vector and (3 is a p x d matrix of parameters. Also let 

X = (x 1 ,x 2 , ■ ■ -,x n ) T 

be the nx p design matrix. Notice that we are effectively considering a multivariate regression model 
with error term of skew t type. It would be inappropriate to use such a distribution, and in fact even 
a regular elliptical distribution, for the joint modelling of the n observations, since usually these are 
supposed to behave independently. 

It is convenient to reparametrize the problem by writing 

Q- 1 = v4 T diag(e-^)A = A T DA, and 77 = uu^a 

where A is an upper triangular d x d matrix with diagonal terms equal to 1 and p G M. d . The 
loglikelihood function for the parameter 9 = (/3, A, p, 77, log v) is then 

n 

*(0) = J>(0) (30) 
i=\ 

where £i(9) is the contribution to the loglikelihood from the 2-th individual; this term is 

iiiO) = log 2 + \ log \D\ + log g d {Qi- u) + logTi(t(Lj, Q h v);v + d) 

where 

Ui = m- P T Xi, Qi = ujQ^Ui, U = a t(L, Q,u) = L 



v + d \ 
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Maximisation of this log-likelihood function must be accomplished numerically. To improve effi- 
ciency, the derivatives of (30) can be supplied to an optimisation algorithm; details for computing 
these derivatives are given in an appendix. 

A suite of R routines for evaluating the above log-likelihood and its derivatives has been devel- 
oped, and it is available on the WWW at http ://azzalini.stat. unipd. it/SN. 

In connection with the skew normal distribution, Azzalini (1985) and Azzalini & Capitanio 
(1999) have highlighted some problematic aspects of the likelihood function. A key feature is that 
the profile log-likelihood function for a always has a stationarity point at a = 0, which in turn 
is connected to singularity of the information matrix at a = 0. These problematic features were 
the motivation to introduce an alternative parametrization which overcomes most if not all of these 
problems. 

It was a pleasant surprise to find that in the present setting the behaviour of the log-likelihood 
function was to be much more regular, at least for those numerical cases which we have explored. A 
graphical illustration of this statement is given by Figures 5 and 8 below, which show some profile 
log-likelihood plots. These plots refer to specific datasets, but a similar regularity was found with 
some other datasets which we have considered. 

It would be useful to have some theoretical insight on why the log-likelihood function using 
the skew t distribution behaves so differently from the skew normal model, as well as to gather more 
numerical evidence of its behaviour. However this theme appears to be a project on its own, and 
cannot be pursued here. 

On another front, Fernandez & Steel (1999) have highlighted difficulties in regression models 
when the error term is assumed to have a t distribution with unspecified degrees of freedom to be 
estimated from the data. Specifically, their Theorem 5 states there are points of the parameter space 
where the likelihood function becomes unbounded, if the degrees of freedom are allowed to span over 
the whole range v € (0, oo). To avoid this effect, one must restrict the range of v to the interval 
(uo, oo), where the threshold uq is a function of X and y. For instance, in the case of a simple random 
sample with no ties in the y^'s, we obtain uq = d/(n — 1), which imposes a very mild limitation. For 
the stackloss data example discussed by Fernandez & Steel (1999) with d = 1 and p = 3, the value 
of vq is small, 8/13. In addition, they recall some numerical examples from the literature where poles 
have been found by various authors; in all these cases, however, these poles where found at values of 
v very small, always below 0.30. 

Therefore, in practice the difficulties can be circumvented by avoiding a certain portion of the 
parameter space which would be somewhat peculiar anyway. However, the fact that uq depends on 
the response variable leads to a procedure which lacks complete support by the theory of likelihood 
inference. As advocated by Fernandez & Steel, a better theoretical understanding of this sort of model 
and the associated log-likelihood properties is therefore called for. 

It is plausible that regression models with skew t error terms behave quite similarly to analogous 
cases which employ a regular t distribution, as for the phenomenon discusses by Fernandez & Steel 
(1999). In the numerical work of the next subsection, we have been driven by considerations described 
above, and decided to ignore poles of the log-likelihood very near v = 0. We have however searched 
for them, but the only case where we have successfully located one was with the stackloss data, near 
v = 0.06, while the maximum above the threshold vq = 8/13 was at z> = 1.14. 
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PP— plot for normal distribution 



PP— plot for skew— t distribution 
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Figure 2: AIS data: Healy's plot when either a normal distribution (left-hand side panel) or a skew t distribution 
(right-hand side panel ) is fitted to the data 

5.2 Numerical examples 

AIS data It is instructive to examine the outcome of a data fitting process based on the skew t 
distribution in a few practical cases. Data on several biomedical variables from 202 athletes have been 
collected at the Australian Institute of Sport; see Cook & Weisberg (1994) for their description. 

We consider here four variables, (BMI, Bfat, ssf, LBM), which represent represent the body 
mass index, the percentage of body fat, the sum of skin folds and the lean body mass, respectively. 
A St.4 distribution has been fitted to the 202 points, and Figure 2 shows the associated Healy's plot, 
using the multivariate normal and the skew t distribution, as described at the end of Section 4.2. The 
plots indicate a satisfactory fit to the data provided by the skew t, markedly superior to the normal 
one. 

This figure matches with Figure 6 of Azzalini & Capitanio (1999), who fit a SN distribution 
to the same data. While the SN fit was definitely superior to the normal one, still there was some 
discrepancy from the identity line which has now vanished almost perfectly. 

The full list of estimated parameters is not of particular interest, but it is noteworthy that v = 
13.7, which confirms the presence of somewhat longer tails than the normal distribution. 

We do not present the analogue of Figure 5 of Azzalini & Capitanio (1999) because its graphical 
appearance in our case is not so markedly different from their Figure 5. These differences exist, but 
they become graphically evident only in a summary plot like the one reported. 

Strength of fiber-glass Smith & Naylor (1987) have reported values concerning the breaking strengths 
of 1 .5 cm long glass fibers. These data have also been considered by Jones & Faddy (2002) in asso- 
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Figure 3: Fiber-glass data: histogram and fitted skew t densities; the continuous curve refers to the density 
studied in this paper, the dashed curve refers to Jones ' model 

ciation with another form of skew t distribution, and comparison with their results is the reason for 
including this example here. 

Figure 3 shows a histogram of the data and skew t densities fitted using (26) and the Jones' 
distribution. The two parametric densities are graphically very close, and choice between the two dis- 
tributions has to be based on other aspects, rather than empirical adequacy. The Healy plot associated 
to (26), in Figure 4, confirms a satisfactory fit of the parametric distribution to the data. 

Other interesting features are indicated by twice the profile log-likelihood functions for the 
parameters a, log v, (log u, a) and (a, log v) reported in panel (a) to (d) of Figure 5, respectively. The 
contour lines for the two parameter cases are chosen to correspond to differences from the maximum 
equal to the quantiles of level 0.50, 0.75, 0.90, 0.95, 0.99 of the x| distribution; hence each contoured 
region can be interpreted as a confidence region for the pair of parameters, at the quoted confidence 
level. As anticipated earlier, these plots have a quite regular behaviour, not very far from quadratic 
functions. 

This figure also indicates quite clearly a significant negative skewness of the distribution, since 
the confidence regions up to level 95% are entirely on the left of a = 0. This conclusion is confirmed 
by the value of a divided by its standard error, which is —1.55/0.574 « —2.70, with corresponding 
p- value about 0.7%. There is also an indication of a long tail of the distribution, since v = 2.73, but 
rather higher values of v are not ruled out. These conclusions are broadly similar to those of Jones 
& Faddy (2001); from our analysis there appears to be a slightly stronger indication of significant 
skewness. 
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PP— plot for normal distribution 



PP— plot for skew— t distribution 




Figure 4: Fiber-glass data: Healy's plot when either a normal distribution (left panel) or a skew t distribution 
(right panel) is fitted to the glass data 

Martin Marietta data Our next example considers data taken from Table 1 of Butler, McDonald, 
Nelson and White (1990). Based on the arguments presented in that paper, a linear regression is 
introduced 

y = /3 + /3iCRSP + e 

where y is the excess rate of the Martin Marietta company, CRSP is an index of the excess rate of 
return for the New York market as a whole and e is an error term which in our case is taken to be 
distributed as St(0, oj 2 , a). Data over a period of n = 60 consecutive months are available. 

The resulting fitted line is shown in Figure 6, which displays the scatter-plot of the data with 
superimposed the least squares lines and the line obtained from the above model after adjusting for 
E{e}, whose intercept and slope are 

Po + E{e} = 0.0029, fa = 1.248 

respectively. These values are very close to those obtained using the skew t distribution of Jones 
(2001), and the addition of that line to Figure 6 would be barely visible, being essentially coincident 
with our line. The estimated skewness parameter is a ~ 1.246 with standardised value 1.246/0.653 ps 
1.908 and observed significance 5.6%. The estimated degrees of freedom are v = 3.32 (s.e.1.43). 

As further indication of the agreement between observed data and fitted distributions, Figure 7 
shows the histogram of the residuals after removing the line + /3iCRSP, and the fitted skew t 
density; there appears to be a satisfactory agreement between the two. Similarly to Figure 5, the shape 
of the log-likelihood function displayed a nice regular behaviour, as indicated by Figure 8. Finally, 
Figure 9 compares the Healy's plots for the normal and a skew t fitted models. Expectedly the normal 
model shows obvious inadequacy, while the skew t model behaves satisfactorily. 
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Figure 5: Fiber-glass data: twice profile negative relative log-likelihood for parameters a, logu, (logw, a) 
and (a, log v) are given in panel (a) to (d), respectively respectively 
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Figure 6: Martin Marietta data: scatterplot and fitted regression lines; the dot-dashed line is the least squares 
fit, the continuous line is the one using a skew t error term 




Figure 7: Martin Marietta data: histogram of the residuals of linear regression and fitted skew t distribution 
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Figure 8: Martin Marietta data: twice profile negative relative log-likelihood for parameters a (left panel) 
(a, logv) (right panel) 



PP— plot for normal distribution PP— plot for skew— t distribution 
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Figure 9: Martin Marietta data: Healy's plot when either a normal distribution (left panel) or a skew t distribu- 
tion (right panel) 
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6 Discussion 



A number of broadly related proposals and results have appeared in the recent literature under the 
connecting concept of the multivariate skew normal distribution. The present paper has examined the 
relationships among many of the above proposals, especially of those dealing with various formula- 
tions of skew elliptical family, by examining their connections and providing a more general approach 
to obtain several specific results. 

Among the broad class of skew elliptical family, the multivariate skew t distribution offers 
ample flexibility for adapting itself to a very wide range of practical situations, and still it maintains 
mathematical tractability and a set of appealing formal properties. Some numerical evidence and the 
availability of developed software for inference provide additional support for using the distribution 
in practical cases. Other interesting distributions have been presented in the literature, most of which 
fall under the general umbrella of density (12) and its extensions discussed at the end of Section 2. 

A wide and closely interconnected set of specific results is evolving towards a quite general 
framework. Open problems still exists, both on the probabilistic and on the inferential side of this 
area of work, as we have mentioned at various points in the paper, and additional, yet unexpected 
results will be discovered. However, what seems to us the more important direction of work, at this 
stage, is to make use of the available results in tackling real problems. This is the ultimate test to 
decide of the actual usefulness of all this work. 
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Appendix 



A Proof of Proposition 8 

Consider Y = L~ lJ (Z — £), where the d x d matrix L is such that = L T L. Then the density of Y 



is 



2f(y;I)G{w L (y)}, 

where WL(—y) = w(—L T y) = —WL(y). Using the transformation to spherical coordinates 



Yj = R \J[ sin 6 k j cos ^ , 1 < i < d - 1, F d = ^ JJ sin fc j sin 9 d - lt 

where i? > 0, 9\. G [0, 7r), for A; = 1, . . . , <Z — 2 and 0^-1 G [0, 27r), and taking into account that the 
Jacobian is r d ~ l nfc=i( sm #fc) d k we nave 



>R 



(9,r) 



= 2 c d /(r 2 )r d_1 J] (sin e fc ) d_fc-1 G{«;L(r cos 0i, r sin 0i cos 2 , . . . , r sin 0i . . . sin d _i)} 
fc=i 

cl-2 



fc=i 



where = (0i, . . . , 0d_i) T , and u/|,(0, r ) = wl(t cos 0i, r sin 9\ cos 02, • • • , r sin0i . . . sin 0^-1 )■ 
2 7r ^/2 

Notice that Crf/(r 2 )r d ~ 1 is the density of the radius in the stochastic representation (11) of 

T[d/2) 

Y(d/2) d2 

the elliptical random vector U, say, having density /, and - ^ 2 JJ(sin0fc) d_fe_1 is the spherical 

k=l 

coordinates representation of the uniform distribution on the unit sphere of M. d ; see Fang et al. (1990, 
Section 2.2.3). From Proposition 2 it follows that R 2 = Y T Y = U T U, so that the marginal density 
of R is given by 

?TT d / 2 

'*« = wm cJir2)r 

By integrating the joint density fg^ with respect to r, the marginal density of turns out to be 



roo 



d—2 

f'W = ^^US sine ^ d ~ k ~ l2 1 fr(r)G{wl(9,r)}dr 

= ^^II(sm0 fc ) <i - fc - 1 P{X<<(0,i?)}, 
71 k=i 

where X is a random variable with cumulative distribution function G. 
The conditional density of given R = r is equal to 



f m=T {9) = T -^L JjWfc)"-*- 1 W'H, 



fc=l 
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which is a density of type (12) with location parameter (tt/2, . . . , 7r/2, it). In fact for any r > and 
any matrix L the equality 

wK* -&i,tt - 2 , ■ ,7r + 6» d _i,r) = -W£,(0i,02,-- . ,0 d _i,r) 

holds true, and consequently the random variable = w* L (9, r) is symmetrically distributed around 
7r. Then, using Lemma 1 in Azzalini & Capitanio (1999, p. 599), the result follows. QED 



B Derivatives of the skew t log-likelihood 

Write U = (u\, . . . , u n ) T . Then the derivatives of (30) are obtained from 

go 

— = -2X T diag(5 Q + 7\ tg^fT 1 - X T diag(T! i L )l nV T 

dt ( \ 

— = 2 upper triangle of AU T diag(g Q + f i Qt Q ) UJ 

02 = IdQ^A^dmgCgQ + nQ^UA^ + ^nD- 1 
dt 

— = crdiag(Ti©t L )l n 



dv ^ \ dv dv 

where the components of the vectors are obtained by evaluation of the quoted expressions at each of 
the n observations, denotes the Hadamard (or element-wise) product and 

~g Q = d\ogg d (Q;v)/dQ = - 1 ^(l + Q/v)- 1 

f x = d\ogT 1 (t;v + d)/dt = T 1 (t;v + d)~ 1 t 1 {t;v + d) 

v + d^ 1 ' 2 



t L = dt(L,Q,v)/dL- 
i Q = dt(L,Q,v)/dQ 



Q + v) 
L(v + d) 1/2 



2{Q + vfli 

dlog9d ~ 1 + d)) - - d/v + ~ log(1 + QH ) 



dv 



denoting by tp the digamma function. What is not given above is an expression for 

dlogri(t(L,Q,i/);i/ + rf) 
dv 

which appears intractable and must be evaluated numerically. 

For transforming the above derivatives of D and v into those of their logarithmic transform, we 
just use the chain rule 

dt dt dt dt 

— = —— (-2D*) , -^— = — v 
dp dD* dlogv dv 
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where D* denotes the diagonal of D. 

The above expressions do not lend themselves to further differentiation. Therefore, in the nu- 
merical work described in Section 5, the observed information matrix has been obtained via numerical 
differentiation of the first derivatives. 
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